Individual decision-makers consume information revealed by the previous decision-makers, and produce information that may help in future decisions. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as elsewhere, such as medical decisions. Each decision-maker would individually prefer to "exploit": select an action with the highest expected reward given her current information. At the same time, each decision-maker would prefer previous decision makes to "explore", producing information about the rewards of various actions. A social planner, by means of carefully designed information disclosure, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare.
We model this as a multi-arm bandit problem under Bayesian incentive-compatibility constraints induced by agents' priors. Among other results, we provide a black-box reduction from an arbitrary algorithm to an incentive-compatible one, with only a small increase in regret. Further, we consider an extension where multiple agents participate in each round of the bandit problem, affecting each other's utilities.
Joint work with Yishay Mansour (Tel Aviv University and MSR Israel), Vasilis Syrgkanis (MSR-NE) and Steven Wu (MSR-NYC).
.